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^ Abstract 

^ Two old conjectures from problem sections, one of which from SIAM 

CN Review, concern the question of finding distributions that maximize 

^{Sn < t), where 5„ is the sum of i.i.d. random variables Xi,. . . ,X„ 
,_i on the interval [0, 1], satisfying E[Xi] = m. In this paper a Lagrange 

multiplier technique is applied to this problem, yielding necessary con- 
ditions for distributions to be extremal, for arbitrary n. For n = 2, 
a complete solution is derived from them: extremal distributions are 
discrete and have one of the following supports, depending on m and 
^ t: {0, t}, {t - 1, 1}, {t/2, 1}, or {0, t, 1}. These results suffice to refute 

G both conjectures. However, acquired insight naturally leads to a re- 

vised conjecture: that extremal distributions always have at most three 
T— H support points and belong to a (for each n, specified) finite collection 

>> of two and three point distributions. 

a^ 
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O 1 Two unsolved problems 
> 

• T-j The problem section of the June 1986 issue of SIAM Review lists the fol- 

r> lowing, labeled Problem 86-6* [5]: 

c3 



•S 



In many audit populations items may have partial errors. Sup- 
pose each item in the population has an error size known to be 
in the interval [0,1]. Suppose the mean population error is m 
where < m < 1. A simple random sample of size n is drawn 
with replacement from that population. Let Sn be the random 
variable representing the sum of the error sizes of the n sampled 
items. Given a constant t < m/n how should the error sizes be 



* A preliminary version of this paper was presented at IWAP2008, Compiegne, France. 
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distributed in the population to maximize P(<S'„ < t)? It is con- 
jectured that for each m and t, there is a population with just 
two error sizes, one of which is or 1, such that P(S'„ < t) is 
maximized. Prove or disprove. 

It is added that if this conjecture is true, then it will be possible to determine 
simple bounds on upper confidence limits for some audit sampling problems. 
The problem section of Statistica Neerlandica, Vol. 47, no. 1, lists the fol- 
lowing as Problem 294 [!]• 

Consider i.i.d. random variables Xi, . . . , Xn with < Xi < 1 
and E [Xi] = m given. Let Sn = Xi + ■ ■ ■ + Xn. Consider the 
following statement: 

P = HSn<t) is maximal if ¥{Xi = 1) = l-P(Xi = 0) = m. 

Show that this statement holds for all t such that p < po for 
some Po < 1 and find such a value for pQ. 

It appears that no solutions to these problems have been published. This 
paper addresses them and presents a (partial) solution by considering: 

Let < m < 1 and let be the set of probability measures 
on [0,1] with mean m. Let Xi, . . . , Xn be i.i.d. G T>m and 
Sn = Xi + ■ ■ ■ + Xn- Determine p{m, t) = sup^g^)^ IP//(S'n < t) 
and, if possible, (all) attaining the maximum. 

Note that p{m, t) = 1 for mn < t: set Xi = m, all i; for mn < t the question 
of maximizing ¥{Sn > t) would be more natural. However, since mn < t < n 
implies n—t < ra(l— m) and ¥{Sn >t) = — Sn <n — t) < p{l—m, n—t), 
this case is included in the problem statement; t = mn once again is the 
trivial case. So henceforth, < i < mn is assumed. 
In the sequel, when emphasis on the dependence on fi is required, 
will be used. The supremum p{m,t) is indeed attained by an element of 
by Weierstraj3' theorem, because the set Dm is wcak*-compact and 
M '-^ ^niSn < t) is weak*-continuous. A /j, & Vm is called extremal (for 
certain n, m and t) if P^(S'„ <t)= p{m, t). 

The problem at hand satisfies a common rule: n = 1 is trivial, n = 2 can be 
solved with a reasonable amount of work, and n > 3 is hard. After the n = 1 

case, wc start with some general observations. After that, some relevant re- 
sults from the literature are discussed, which show that part of the n = 2 
case follows from a paper by Hoeffding and Shrikande [3] from 1955. We em- 
bark on a different approach, applying a Lagrange multiplier technique from 
Mattner [4], in Section 2, for arbitrary n. The resulting Lagrange conditions 
provide a characterization of extremal distributions. For the n = 2 case, this 
allowed us to show that supports of extremal distributions necessarily look 
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like {s, t — s} oi {s, t — s, 1}, for some s. After this reduction, shown in Sec- 
tion 3, the search for extremal distributions may be restricted to this more 
manageable class and a complete analysis is carried out. As it turns out, 
the conjectures stated in the problems above can be refuted based on these 
results (Section 4). The first conjectured solution, however, seems "almost 
true" and its exception is understandable, so in Section 4.1 a revised and 
sharpened conjecture is formulated, including the specification, for each n, 
of a collection of distributions of which the extremal one is conjectured to 
be a member. 



1.1 The case n — 1 

Markov's inequality implies that 

F{X<t) = F{l-X>l-t)< = (1) 

and this upper bound is attained l)y the following two-point distribution: 
F{X = t) = (l-m)/(l-i) andP(X = 1) = {m-t)/{l-t). 



1.2 Some results from the hterature 

Hoeffding and Shrikande [3] obtained results on the supremum of the distri- 
bution function of the i.i.d. sum of two random variables, given k moment 
conditions and a restricted range. They showed that the supremum over all 

such distributions is the same as that over all discrete distributions with at 
most 2k + 2 support points. In addition, they provide the following bound 
for nonnegative i.i.d. Xi and X2 with E [Xi] = 7: 



F{Xi + X2 > C7) < < 



1 if c < 2; 

4/c2 if2<c<|; (2) 

^2/c-l/c2 if|<c. 



For i.i.d. Xi and X2 on [0, 1], with E [Xi] = m, one may translate the above 
result to one on the left tail, by switching to the complements with respect 
to 1: 

P(Xi + X2<t)=F{l-Xi + l-X2>2-t), 

and (2) applies with 7 = 1 — m and c = (2 — i)/(l — m). The distribu- 
tions attaining the resulting bound have as their support, respectively, {m}, 
{t/2, 1}, and {t — 1, 1}. The first applies to c < 2, or t > 2m, the trivial case; 
the second to 2 < c < 5/2, or t/2 <m< {2t + l)/5; the third to 5/2 < c, 
or: t > 1 and {2t -|- l)/5 < m < 1. These results resolve the n = 2 case for 
a subset of (m, i)-values. Furthermore, Hoeffding and Shrikande [3] did not 
address the question of uniqueness of the extremal distributions. 
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Hoeffding's well known inequality (see [2]) bounds deviations from the ex- 
pected value for the average of independent, not-necessarily identically dis- 
tributed, random variables. The author states that the bound is not opti- 
mal, but the best bound that can be obtained via his method, based on the 
moment generating function. Applying Theorem 1 [2], one obtains: 



For n = 1 it is clear that (1) is sharper, since m > t. 
1.3 Some general observations 

One may assume or 1 to be in the support. Recall the definition 
of the support of a measure /x: the smallest closed set with measure 1; we 
denote it by supp (jj,). Suppose, for some extremal jj, one has supp (fj,) C [a, b] 
with < a < 6 < 1. Let Xi have distribution and Zi = m + a{Xi — m), 
i = 1, . . . ,n. Then E [Zi] = m and there exist a > 1 such that supp {Zi) C 
[0, 1]. Writing = Zi + ■ ■ ■ + Zn = aSn — (a — 1) mn one has for t < mn: 



Thus, if iJ, is extremal, a measure /i* can be found that is extremal as well, 
and if the largest a is chosen that satisfies supp (/x*) C [0, 1], then supp (//*) 
will contain or 1. 

The supremum p(m, t) is non-increasing in m. Fix n and t. Suppose 
mi < 1712 and /X2 attains the maximum value p{m2,t). Suppose /X2 has dis- 
tribution function G. Define, for < r < 1, the distribution function Fr by 
Fr{x) = max(r, G{x)). As r goes from to 1 the expectation of the corre- 
sponding distribution decreases from m2 to 0, continuously, so for some r the 
corresponding distribution /xi has expectation mi . This measure is stochas- 
tically smaller than /X2. Hence, p{mi,t) > P^i(<S'n <t) > ^^^{Sn <t) = 
p{m2,t). 

Subprobability measures with expectation at leeist m. Instead of 
taking the supremum over Vj^ one could take the supremum over the set 

of all su6probability measures on [0, 1] with mean at least m, and the same 
p{m, t) would result. In order to show this, suppose z/ is a subprobability 



where the first inequality states that putting the mass-defect in will lead to 
an improvement, and the last inequality follows from the non-increasingness 
proved above. Considering that d > or r > m will make at least one of the 
inequalities strict, it is clear that p{m, t) can only be attained with d = 
and r = m. 




(3) 



<t)= F{Sn < ^ t + ^ mn) > P(5„ < t) . 




4 



An upper confidence bound on m. Using Sn as test statistic one may 
define a non-parametric confidence bound on m, as follows. Let m,u be 
solution of pn{m,t) = a. By the non-incrcasingness proved above, this 
implies that Pn{m,t) < a for m > rriu- So, for m > it follows that 
^{Sn <t) < Pnim,t) < Pnimu,t) = a. 

2 Mattner's Lagrange approach 

Mattner [4] developed a general method for treating extremal problems 
for probability distributions. His main theorem is stated below and sub- 
sequently applied to the problem: 

Theorem 1. Let Z be a Banach space, ipi : Z M, i = 0,...,k and 
ipj : Z ^ j = 1, . . . ,1, continuously Frechet-differentiable, and C a 
convex cone in Z. Define the Lagrange functional 

k I 

C{z) := Aov^o(^) + XI ^i'Pii^) + XI 

i=l 0=1 

and let dC{z;w) denote the Frechet-derivative of C{z) in direction w. If 
z E Z minimizes (po subject to 

'Pi{z)<0, i = l,...,k, 
^j{z) = 0, j = l,...,l, 
zeC, 

then there exist Aq, Ai, . . . , A^, ai, . . . , a; G M with 

(i) not all Aj and cxj vanish, 
(a) \i > 0, i = 0, . . . , k, 
(Hi) dC{z;w) > 0, w e C 
(iv) dC{z;z) =0. 

Application to the problem. Let M. be the set of signed Borel measures 
on [0, 1], with norm = J |/u(dx)| (total variation). The pair {M, \\ ■ ||) 

is a Banach space and probability measures are contained in the positive 
cone C = {iJ, & M. : IJ-{B) > for every Borel set B}. Define 

(po{fi) = -F{Sn <t) = - /x(dxi) • • • fJ,{dXn), 

where A = {(.xi, .T2, . . . , Xn) : xi + ■ ■ ■ + Xn < t}. The (clearly continuous) 
Frechet-derivative of ipo is given by 

dipo{^; u) = —n / fi{dxi) ■ ■ ■ fi{dxn-i)i^{dxn) = —n / P(S'n-i <t — x) v{dx). 
J A Jo 
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Indeed, this follows from the next formula, which is established by binomial 
expansion of the n-fold product of the measure + v. 

\\ipoifJ, + <po{n) - dipoifj,; = O {\\uf) . 

For the constraints define 

(fiiifj,) = /x([0, 1]) — 1 and (p2ilJ') = m— x /x(dx), 

Jo 

whose (continuous) Frechet-derivatives are given by 

dipi{iJ,;u) = iy{[0,l]) and dip2{lJL;v) = — I xv{dx). 

Jo 

Now, define the Lagrange functional: C{fi) = Aq ifo{fi) + Xi (^i(//) + A2 </'2(m). 
From Mattner's theorem one concludes: if fi minimizes (po subject to v'i(m) ^ 
0, <^2(a') < 0, /Lt > 0, then there exist nonnegative Aq, Ai, and A2, not all 
zero, such that 

dC{fi;iy) >0, for z/ > 0, (4) 
d£{fi;fi) = 0, (5) 

where djC{fi; v) is the Frechet-derivative of £ at /x in direction u and given 
by 

dC{iJ,; ly) = i{x) ^{dx) 
Jo 

with 

e{x) = -n Ao P(5'n-i < t - x) + Ai - A2 X. 

Note that i{x) is continuous from the left and that jump-discontinuities (if 
any) are upwards. 

2.1 The Lagrange conditions 

From Mattner's theorem some properties of extremal distributions can be 
derived, as well as an expression of P(S'n < t) in terms of the Lagrange 
multipliers. First, the redundant Lagrange multiplier Aq is removed. 
From Lagrange condition (4), by substituting u = 8^ (point-mass at .x), one 
may conclude l{x) > 0, for < x < 1. Combining this with the second 
Lagrange condition (5) results in: 

£{x) = for /x-a.e. x. (6) 

It is first argued that Aq cannot be zero. If Aq = 0, then £{x) = Ai — A2 x 
should be nonnegative for < x < 1, whence Ai > A2 > and, necessarily, 
Ai > 0, for they cannot all three be zero. However, i(x) = must have 
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at least one solution, or else supp (/i) = 0. This leaves Ai = A2 as sole 
possibility, implying that ^ = 5i, which contradicts the assumption K[X] = 
m < 1. Therefore, Aq > and without loss of generality it is henceforth 
assumed that nAo = 1. 

Lagrange condition (5), < 0, and </'2(a*) < 0, imply 

<t) = Ai/x([0, 1]) - A2 / x/x(dx). (7) 
Jo 

The Lagrange conditions can be restated as 

^{Sn-i <t-x) < Xi- X2X, < .T < 1, and (8) 
F{Sn-i < t — x) = Xi — X2 X, for X G supp (/x). (9) 

The following lemma shows that the last statement follows from (6): 

Lemma 1. Let ji he extremal. Then £{x) = for x G supp (fj,). 

Proof. Let x G supp (/x) and suppose a Borel set A C [0, 1] satisfies = 1 
and i{y) = for y G ^. If x is an atom of n then x E A and i{x) = follows. 
Otherwise, if x is an interior point or a right boundary point of supp(^), 
a sequence (xk) can be found within A such that Xk T x, whence i{x) = 0, 
by left-continuity of i. If a; is a left boundary point, one can find within A 
a sequence Xk i x, whence < £{x) < £{x+) = lim£{xk) = 0, since jumps 
cannot go down. □ 

It is shown that A2 > must hold. Let s = minsupp (/i) and u = max supp (/x), 
then s < m < u and gaps in supp(S'n-i) cannot exceed u — s in length. 
Lagrange conditions (8) and (9) imply P(t — u < Sn-i < t) < X2 u. The 
probability, however, must be positive: E [Sn-i] = {n — l)m>t — m>t — u 
implies that F{t — u < Sn-i) must be positive; F{t < Sn-i) = 1 cannot be 
the case, or else P(S'„ <t) = and fi is not extremal. 

Support conditions. The Lagrange conditions imply several properties 
for the support of Sn-i and Sn- An immediate consequence of the next 
lemma is that t G supp (Sn)- 

Lemma 2. Let fi be extremal, x ^ 1. If x E supp (/x) then t — x E 
supp(S'n-i). 

Proof. By contraposition. Suppose t — x^ supp {Sn-i), for some < x < 1. 
Let B^{x) = (x - e,x + e). Then P(S'„_i G B^{t - x)) = for some e > 0, 
implying that F{Sn-i < t — y) is constant for y G B^{x) and that £{y) > 
is linearly decreasing on this set, which implies £{x) > 0. For x = 0, this 
reasoning shows that £{y) is linearly decreasing for y G [0, e), with ^(0) > as 
conclusion. For x = 1, nothing about the positivity of 1(1) can be concluded 
from the fact that £{y) is linearly decreasing and positive for y G (1 — e, 1]; 
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= is still possible. So, for < x < 1, t — x supp(S'„_i) implies 
£{x) > 0, which by Lemma 1 implies x ^ supp (/x). □ 

The Lagrange conditions in this section provide necessary conditions (8), 
(9), and Lemma 2, that should be satisfied by extremal distributions. It is 
not difficult, for general n, to identify a number of distributions that satisfy 
them (see Section 4.1). However, unless all the solutions are identified, 
there are no guarantees that the best of the solutions found indeed attains 
the supremum p{m,t). 



3 The case n = 2 

Lemma 2 yields an especially strong result for n = 2, because Sn-i = Xi and 
the lemma characterizes the support of (candidate) extremal distributions. 
Below, certain two and three point solutions to the Lagrange conditions 
will be identified. Other solutions (if any) cannot be extremal: it will be 
shown that one can always find a distribution of the two or three point type 
that has a strictly larger F{Sn < i)-value. Hence, all extremal distributions 
belong to this special class. 

Suppose satisfies the Lagrange conditions and s = minsupp (//). Lemma 2 
implies that t — s e supp (/x), and if this is not the largest support point, 
then max supp (/x) = 1. Therefore, two cases are to be considered. 

First, assume that t — s = maxsupp(^). Note that, necessarily, < s < 
t — s<l and t — s>m (or else xfi{dx) < m), which imply s < t/2 < t—s 
by m > t/2. Note that m < t must hold, or no such exist. 
Let F be the distribution function corresponding to fi. Lagrange condi- 
tion (9) requires that nonnegative Ai and A2 exist such that: 

F{t- s) = Xi- X2S and F{s) = Xi - X2 {t - s). 

From the monotonicity and nonnegativity of F: 

f F{x)dx>{t-2s)F{s) + {l-t + s)F{t-s), (10) 
Jo 

where equality holds (if and) only if s and t—s are the only support points. 
Since F{t — s) = F{1), one may write Ai = F{1) + A2 s and F{s) = F{1) — 
A2 {t — 2s). Combining things, one obtains: 

1 1-1 
xn{dx) = F{1)- F{x)dx<sF{l) + X2{t-2sf (11) 
Jo 

whence A2 > (m — s)/{t — 2s)^. Starting from (7), this results in: 

MS2<t)<Xi-X2m = F{l)-X2im-s)<l- {——) . (12) 
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Note that < {m—s)/{t—2s) < 1 since s < t/2 < m and m < t—s. Let vr be 
the probabiUty measure vr on {s, t — ,s} defined by nt-s = ("i ~ s)/{t — 2s) = 
1 — TTs, where tTx := 7r({x}). Then X7r{dx) = m and P7r('S'2 < t) equals 
the right hand side of (12). The upper bound on P^(5'2 < t) is strict, unless 
F{1) = 1, X ^{dx) = m, and equality holds in (10). These conditions, 
however, uniquely identify tt, showing that fj, can only be extremal if /i = tt. 

Next, consider the situation where t — s < maxsupp(//) = 1. Necessarily, 
< s < t—s < 1 must hold. Lagrange condition (9) specifies for the support 
points s, t — s and 1, respectively: 

F{t- s) = Xi- X2S, F(s) = Ai - A2 (t - s), and F{t - 1) = Xi - X2. 

Since t — 1 < s = minsupp (/li), F{t — 1) = and so Ai = A2, which is used 
to eliminate Ai. 

Note that F{l - s) < F{1). Further, that F(l) = 1 must hold, or the defect 
could be added as an atom in 0, which would strictly enlarge P^,(<S'2 <t). If 
F{1) = 1 and Jq xfi{dx) > m then a small mass e > could be moved from 1 
to 0, still keeping the mean above m. This would increase P^(5'2 < t) by at 
least e^. Hence, if /x is to be extremal, then F{1) = 1 and Jq xiJ,{dx) = m 
must hold. 

Combining (10) with F{t — s) = A2 (1 — s) and F{s) = A2 (1 — i + s), one 
obtains ^ 

l-m= F{x)dx> X2{l-t + s){l + t-3s), (13) 
^0 

where equality holds (if and) only if s, t — s and 1 are the only support 
points. Apparently, 

,4- 1 — m 

A2 < A+ := 



{1 - t + s){l + t-3s) 

Define the measure tt on {s, t — s,l} by 

TT, = A+(l-t + s), 7rt_, = A+(t-2s), 7ri = l-A+(l-s). (14) 

(Note that s = t — s leaves a valid probability measure on the set {i/2, 1}.) If 
A2"(l— s) < 1, then tt is a probability measure with mean m: the probabilities 
are nonnegative and sum to 1, STrs + {t — s) irt-s + '^i = 1 — A2"(l — t + s)(l + 
t — 3s) = m. Furthermore, tt satisfies the Lagrange conditions (for X2) and 
so 

Pm(S2 <t) = X2 (1-m) < A+ (1-m) = ¥-^{82 <t)- ^ ' 



{l-t + s){l + t-3s)' 
The inequality is strict unless /x = tt; this can be seen from (13). 
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If Aj(l-s) > 1, then 

(1 - m)(l - s) > (1 - t + + t - 3s) = (1 - - (t - 2sf, 
which is equivalent to 

{t-2sf>{l-s){m-s). (15) 
Since A2 (1 — s) < 1 by assumption, starting from (7) (with Ai = A2), 

p,(s,^.) = A,a-™)<i^ = i-'=^,i-(^/, ,16) 

where the last inequality follows from (15). That inequality also implies 
t — 2s > m — s, which combined with m > s guarantees the existence of 
the probability measure with support {s, i — s} and mean m. As was shown 
before, this measure attains the value on the right hand side of (16). In all 
cases it has now been shown that if /i satisfies the Lagrange conditions it 
equals a discrete measure on {s,t — s} or {s,t — s,l}, for some s, or else 
^^l{S2 <t)<'K{S-2<t) for some tt from this class. 

The last steps of the solution consist of optimizing within the class of two 
and three point support distributions just identified. 

Remark: an alternative approach? What follows is a sketch of a proof 

that would work if one could show that extremal measures cannot have a 
singular component. The Lagrange conditions imply that if the support 
contains an interval, say A, then ^ has density equal to A2 on that interval. 
Lemma 2 implies that the same holds for t — A, from which it easily follows 
that fi can be improved upon by moving the mass to the center of the inter- 
vals. If n is purely atomic, a similar argument that exploits the symmetry 
of the support can be used to show that [0,t/2) cannot contain more than 
one atom. After this, four possible support points remain: (some) s, t/2, 
t — s, and 1. A simple mass transfer argument shows that the first three 
cannot occur together. This leaves one with the same possibilities as in the 
current line of reasoning. 

3.1 {s, t — s}-solutions 

Recall that the bound from (12) and (16) can be attained by the probability 
measure vr on {s, t — s} defined by tti-s = {mn — s)/{t — 2s) = 1 — tTs- The 
largest "^■^{82 < i)-value is attained for the smallest feasible s, as TTt-s is 
increasing in s. For t <1 the maximum is at s = 0, for t>lsXs = t— 1. 
Thus, the best solutions of this type are as follows. 
For m <t <1: 7ro = l — m/i, tt^ = m/t and 

iV(52<0 = i-(y)'- 
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For t > 1: TTt-i = (1 - m)/(2 - t), tti = (1 + m - t)/{2 - t) and 
3.2 {s, t — s, l}-solutions 

Candidate extremal distributions are the probability measures with support 
{s, t — s, 1} and probabilities given by (14), provided < s < t/2, s >t — 1 
and A^(l — s) < 1, or: 

(1 -m)(l -s) < (1 -i + s)(l + i-3s). (17) 

For the sake of a simpler exposition two small additions were made to the 
class considered: equality in the previous formula corresponds to boundary 
cases with tti = 0, which, just as the case s = t — \ that was added, leads 
to distributions already considered. 

Recall that P7r(S'2 < t) = {l-mf /{l-t+s)(l+t-Zs) for tt as in (14). What 
remains is maximize over feasible s. Define p{s) = {1 — t + s){l + t — 3s). 
Since p is a concave function, the solutions to (17) constitute an interval; 
call the left end point sq. Since p{s) = (1 — s)^ ~ (* ~ 2s)^, (17) is equivalent 
to [t — 2s)^ < (1 — s){m — s), which shows that s = t/2 is always feasible, 
and only feasible s between sq and t/2 need to be considered. 
Consider the maximization problem: since p is concave, the maximum of 
(1 — mY /p{s) is attained at an end point of the feasible range, i.e., t/2 or 
the left end point. Note, however, that as s | sq, also vri | 0, and what results 
is a distribution on {sQ,t ~ sq}, already considered in the previous section. 
This means that if sq > and sq ^ t — 1, the entire range sq < s < t/2 
corresponds to feasible solutions, at the left end dominated by solutions 
already considered. Then, the distribution corresponding to the right end 
point is the only new (candidate) extremal distribution. It is given by: 
7ri/2 = (1 - m)/{\ - t/2) and vri = (m - ^2)7(1 - t/2) with 

Note that this solution exists for all (m, t)-pairs under consideration. What 
remains now, is to determine whether the maximum can be attained for an 
intermediate value so < s < t/2, which would correspond to a true three 
point distribution. 

First, a closer look at p{s) is warranted. It has zeros at t — 1 and {t + l)/3, 

a maxmium value of (2 - t)V3 attained at {2t - l)/3; p(0) = I - t^ and 
p((5t-4)/6) =p{t/2) = (l-t/2)2. These points are ordered in the following 
manner: 

5t - 4 2t-l t t + l 
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where zero can be anywhere to the left oi t/2, depending on t. 

Prom equahty in (17) one sees that p(so) = (1 ~ m)(l — sq) > 0, and since 

t — 1 is the left zero of p, this implies that sq > t — 1. Therefore, for t > 1, 
the whole range sq < s < t/2 corresponds to feasible solutions dominated 
by one of the two point solutions corresponding to the end points, i.e., with 
support {so,t — So} or {t/2,1}. Next, consider t < 1. A true three point 
solution occurs if sq < 0, which happens if inequality (17) is strict for s = 0, 
which happens if t < ^/m. However, if t < 4/5, then (5t — 4)/6 < 
and p{0) > p{{5t — 4)/6) = p{t/2), whence (1 — m)'^/p{s) attains a higher 
value at s = t/2 than at s = 0. In summary, this shows that the best 
solution is obtained at s = only for 4/5 < t < ^/rn. It is the extremal 
probability measure vr on {0,t, 1} defined by: ttq = (1 — m)(l —t)/{l — t^), 
^x^ = {l- m)t/{l - t^), and tti = (m - t^)/{l - t^), with 

MS2<t)=^^:^^. (18) 

The (m, t)-rangc where this distribution dominates all others has just been 
determined. On the complement of this range several two point solutions 
may exist together and therefore need to be compared. 



3.3 Some compcirisons 

In the region m < t < 1, both the {0,t} and the {t/2, l}-solution exist. The 
second is the best when 

Substituting m = at, the equivalent inequality A{l—at)^ — {l—a^){2—t)^ > 
is obtained, which in turn simplifies to (5 — 4 f + 4) — 8 at — + 4 1 > 0. 
The discriminant of this quadratic in a is 20 — 96 1"^ + 144 1? ~ GAt which 
factors as 4 1 (5 1 — 4) (2 — t)^. This shows that the inequality (19) is valid for 
0<t<4/5, as the discriminant is negative for these values. For 4/5 < t < 1, 
the boundary curve of the inequality (19) is given by 

4t^-(2-t)Vt(5t-4) 
""'^'^ = 5t^-4t + 4 ■ 

For m < mi{t) the {t/2,1} solution is superior; for larger m the {0,t} 
solution is. 

To determine for which m and t the {t/2, l}-solution is best for t > 1 one 
needs to solve 

,m+l — t\ f 1 — m 

1 - : < 



2-t J \^-t/2 
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1.0 

t 



1.5 



2.0 



Figure 1: Support points of extremal distributions, for n = 2. 

Setting a = {1 — m)/{2 — t), this becomes 4a^ > 1 — (1 — a)^, resulting in 
a > 2/5, or 5m < 1 + 2t. 

Summarizing everything, one obtains the following table. The function 
mi{t) on the second line is given in equation (20). Figure 1 shows the 
regions with the support of the respective extremal distributions. 



The extremal measures are unique except on the boundary between the 
{i/2, l}-solution and the others, where two distinct solutions yield the same 
P(S'2 < t)-value; on the other boundaries, the two solutions coincide. Fig- 
ure 2 shows a contourplot of the ratio of the supremum p2{m,t) and the 
Hoeffding bound (3); the bound is sharp at the boundary m = t/2 and 
progressively looser as m increases. 

4 The conjectures: one refuted, one revised 

Strictly speaking, the results for n = 2 suffice to disprove both conjectures. 
Whereas the Statistica Neerlandica conjecture can be utterly disproved, the 



support 



(t, m)-region ¥{82 < t) 



{0,i} 

{0,t,l} 

{t-1,1} 

{i/2,1} 





everywhere else 
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0.5 



1.5 



2 



Figure 2: Ratio of p2{m,t) and the Hoeffding bound (3); contour lines cor- 
respond to 0.1, 0.2, . . . , 0.9, going from top to bottom. 

SIAM Review conjecture is only disproved by a small (m, t)-region where 
the extremal distribution has three support points (including and l!). In 
our view the SIAM Review conjecture is close to what may be true and 
therefore a revised conjecture is formulated below. 

Statistica Neerlandica. It seems that what was meant is "If pn{m,t) = 
supP(S'n < t) is small (enough), then the Bernoulli with mean m is the best 
distribution." Looking at the n = 2 results, the Bernoulli only appears as 
extremal distribution for t = and fort = l,3/5<m<l. Let b{n,p,x) 
denote the probability that a binomial random variable with parameters n 
and p attains a value less than or equal to x. The following is the logical 
negation of the Statistica Neerlandica conjecture: 

Lemma 3. For any < po < 1 there exist n, t, and m, such that b{n, m, t) < 

Pn{m,t) < Po. 

Proof. Set n = 2 and choose any t in (0, 1) or (1,2). Then for m > t/2: 
b{n, m, t) < Pn{^, t) and as m j 1, Pn{ni, t) ^ 0. □ 

4.1 The SIAM Review conjecture revised 

In order to maximize P(S'n <t), it seems that as much probability mass as 
possible should be on or near the boundary Sn = t; the support condition 

from Lemma 2 illustrates this. Furthermore, the fewer support points /v, has, 
the more mass can contribute to the event Sn = t; an illustration of this can 
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be seen in the remark on page 10, where shrinking a continuous portion of the 
distribution to one point doubles the contribution to P(S'„ < t). Sometimes, 
however, putting sonic probability mass at 1 may enable a redistribution of 
mass on lower support points that results in an increase of F{Sn < t). This 
(we think) is the intuitive explanation for the {0, 1} solution. It is also 
the reason we think that the number of support points required is no larger 
than three. 

Conjecture 1. For any n>2, 0<m<l, and < t < mn, all distri- 
butions attaining the supremum Pn{m,t) belong to the collections described 
below. 

Conjectured extremal binsiry solutions. A collection of at most n 
distributions with two support points is identified below. They may not all 
satisfy all of the Lagrange conditions. However, it is conjectured that if an 
extremal distribution is binary, it must be one of these. 

Suppose the support is {a,b}, with 0<a<m<6<l. Lemma 2 implies 
t — a £ supp (5'n-i), which means that t — a=jb+(n— 1— j)a for some 
integer j = 0, 1, . . . , n — 1. From E [X] = m follows that vr := F{X = b) = 
(m — a)/{h — a) = {m — a) j/ (t — na) (where b — a = {t — no)/ j is used) and 
so P(-S'n <t) = b{n, Tr,j). Since tt is increasing in a and b(n, tt, j) decreasing 
in TT, one should minimize a. From < a < m it follows that m — {mn — 
< ^ ^ so for < j < t the constraint 5 < 1 becomes active as 
a i 0. Hence, for these j, the solution is bj = 1, aj = {t — j)/{n — j) (from 
a = {t~jb)/{n — j)) and VTj = 1 — (1 — m)(n — j) /{n — t). Considering that 
m < 6 < 1 implies {t — j) /{n — j) < a < {t — j m)/{n — j), one sees that for 
t < j < t/m one should set aj = 0, bj = t/j, and TTj = j m/t. 
This results in a collection of at most n potential extremal distributions, 
from which the best is selected by comparing the values b{n, Wj , j) , for j = 
0,1,... [t/m] - 1. 

Conjectured extremal ternjiry solutions. It was shown on page 4 that 
the supremum pn{m,t) is attained by a distribution with or 1 in the 

support. The intuitive argument given above suggests that an extremal 
ternary distribution will have both and 1 in the support. Using this as an 
assumption, a collection of (at most (2) ) possible three point supports can 
be identified. In order to precisely specify the distributions, the Lagrange 

linearity condition (9) is needed as well. 

Suppose the support is {0, a, 1} with < a < 1. Lemma 2 implies {t—a, t} C 
supp {Sn-i), whence integers k > 1 and I > should exist, such that k + l < 
n — 1 and t = ka + l. Solving the last equation for a, define ak,i = {t — l)/k, 
which is between and lifO</<t</ + A;<n — 1. In contrast with the 
binary solutions above, the requirement that ¥.[X] = m is insufficient to fix 
the probabilities and as an additional equation one should use the Lagrange 
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linearity condition: P(5'„_i <t — x)is linear for x G {0, a, 1}. This results 
in 

(1 -a)F{t-a< Sn-i <t) =aF{t-l< < t - a) . 

Since S^— 1 is distributed as aNa + Ni, where {Nq, iV„, Ni) have a trinomial 
distribution with parameters n — 1, p = F{X = 1), q = F{X = a) and r = 
F{X = 0), this last requirement is a polynomial equation in p, q, and r. The 
requirements aq+p = m and p + q + r = 1 can be used to eliminate q and r, 
leaving a polynomial equations of order n — 1 in p. 
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